The Evolution of the Massively Parallel Processing Database in Support of Visual Analytics

نویسنده

  • Ian A. Willson
چکیده

This article explores the evolution of the Massively Parallel Processing (MPP) database, focusing on trends of particular relevance to analytics. The dramatic shift of database vendors and leading companies to utilize MPP databases and deploy an Enterprise Data Warehouse (EDW) is presented. The inherent benefits of fresher data, storage efficiency, and most importantly accessibility to analytics are explored. Published industry and vendor metrics are examined that demonstrate substantial and growing cost efficiencies from utilizing MPP databases. The author concludes by reviewing trends toward parallelizing decision support workload into the database, ranging from within database transformations to new statistical and spatial analytic capabilities provided by parallelizing these algorithms to execute directly within the MPP database. These new capabilities present an opportunity for timely and powerful enterprise analytics, providing a substantial competitive advantage to those companies able to leverage this technology to turn data into actionable information, gain valuable new insights, and automate operational decision making. creasingly volatile and complex operating environments and yet are unable to integrate and analyze their data to make timely operational, tactical and strategic decisions. The focus of this article is on advances in MPP database technology, representing the decision support system component within the broader Systems of Systems architecture that spans data and information throughout the enterprise. Much of the history of MPP, relational database technology and scientific computing applications of MPP will not be considered, some of which may be better known to readers than MPP database advances. Our focus is on advances in DOI: 10.4018/irmj.2011100101 2 Information Resources Management Journal, 24(4), 1-26, October-December 2011 Copyright © 2011, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited. data warehouse technology that represent an historic opportunity for those in the analytics field, who until recently have relied on a variety of purpose-built stand-alone tools or methods of limited capability that increasingly compete for scarce Information Technology (IT) resources. To achieve the full potential of analytics on large or complex data, an examination of recent MPP database trends is in order. This article presents a brief history of MPP databases and their role in decision support, with an unavoidable focus on Teradata Corporation, which commercialized MPP and remains a market leader (Feinberg & Beyer, 2008). Significant downward trends in price per unit of performance, both absolute and relative to transactional non-MPP databases are examined. The synergy of MPP technology and the activated Enterprise Data Warehouse (EDW) make it possible to support wide-ranging analytical queries on a single instance of time-referenced data. The EDW provides an authoritative source of reference ideally suited to analytics, particularly the emerging trend of ‘analytics inside’, or pushing down parallelized algorithms based on temporal, spatial and statistical primitives executed directly within the MPP database, greatly increasing the accessibility and potential value of these analytical methods. The result is a cost effective delivery system providing powerful analytics to the enterprise, generating business value from IT investments. THE MASSIVELY PARALLEL PROCESSING DATABASE With a focus on analytics, most of us have not been concerned with the precise details of the systems that host our data. To start off with, let us define our domain of interest to be broad enterprise-scale relational databases focused on supporting analytics and queries of all types, rather than transaction processing. The focus of this article is on performing a wide variety of valued analytics against such databases using an architecture specifically created to process this workload. MPP technology has been successfully deployed in large database systems for more than 25 years, supporting, with varying degrees of automation, many of the types of analytics we perform today. The author has been working with very large relational databases for decision support and analytical applications since 1986. In an early example, an entire IBM® 4081 mainframe was used to host a Human Resources decision support system. Since that time, what constitutes large has changed, as has the cost per unit of storage or unit of power, but surprisingly the fundamental method of storing and retrieving structured data has changed little. The storage and retrieval of normalized data in tables first contemplated by Dr. Ted Codd (1970) is increasingly practical now that there is a suitably powerful hardware platform and database architecture available. The various design mitigations previously required for performance are fading in prevalence. The one fundamental change since 1970 is the commercial introduction of the shared-nothing or Massively Parallel Processing (MPP) database by Teradata Corporation, which shipped its first production system in 1984. One important distinction in this application of MPP to relational databases is that the parallelization is being applied to Structured Query Language (SQL) operators, a fundamentally functional language. Better known applications of MPP, at least 20 years ago, were in the scientific computing realm, where massive parallelization was thought to offer the opportunity to speed up calculations for a wide range of high value scientific computing problems. Thinking Machines Corporation was one of the most widely-known parallel supercomputer manufacturers of the 1980s and early 1990s until its bankruptcy in 1994 (Taubes, 1995). In order to execute scientific calculations in parallel, special language extensions were required to parallelize computing on these MPP systems, such as C* and CM Fortran. Ultimately this approach was not cost-effective, with more economical designs for moderate parallelism utilizing specialized clusters of SMP-based computers succeeding in the commercial mar24 more pages are available in the full version of this document, which may be purchased using the "Add to Cart" button on the product's webpage: www.igi-global.com/article/evolution-massively-parallelprocessing-database/58558?camid=4v1 This title is available in InfoSci-Journals, InfoSci-Journal Disciplines Library Science, Information Studies, and Education. Recommend this product to your librarian: www.igi-global.com/e-resources/libraryrecommendation/?id=2

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Text Analytics of Customers on Twitter: Brand Sentiments in Customer Support

Brand community interactions and online customer support have become major platforms of brand sentiment strengthening and loyalty creation. Rapid brand responses to each customer request though inbound tweets in twitter and taking proper actions to cover the needs of customers are the key elements of positive brand sentiment creation and product or service initiative management in the realm of ...

متن کامل

Evolution of texture in an ultrafine and nano grained magnesium alloy

The evolution of texture was discussed during the formation of ultra-fine and nano grains in a magnesium alloy severely deformed through accumulative back extrusion (ABE). The microstructure and texture obtained after applying multiple deformation passes at temperatures of 100 and 250°C were characterized. The results showed that after single ABE pass at 100°C an ultrafine/nano grained microstr...

متن کامل

Parallel processing in human audition and post-lesion plasticity

Recent activation and electrophysiological studies have demonstrated that sound recognition and localization are processed in two distinct cortical networks that are each present in both hemispheres. Sound recognition and/or localization may be, however, disrupted by purely unilateral damage, suggesting that processing within one hemisphere may not be sufficient or may be disturbed by the contr...

متن کامل

Parallel processing in human audition and post-lesion plasticity

Recent activation and electrophysiological studies have demonstrated that sound recognition and localization are processed in two distinct cortical networks that are each present in both hemispheres. Sound recognition and/or localization may be, however, disrupted by purely unilateral damage, suggesting that processing within one hemisphere may not be sufficient or may be disturbed by the contr...

متن کامل

Optimization of Common Table Expressions in MPP Database Systems

Big Data analytics often include complex queries with similar or identical expressions, usually referred to as Common Table Expressions (CTEs). CTEs may be explicitly defined by users to simplify query formulations, or implicitly included in queries generated by business intelligence tools, financial applications and decision support systems. In Massively Parallel Processing (MPP) database syst...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • IRMJ

دوره 24  شماره 

صفحات  -

تاریخ انتشار 2011